lecture one

on artificial intelligence and face recognition


1


xiaowan : sh-iao one

x.yi@arts.ac.uk https://vimeo.com/user65401583 xiaowan-yi.com


2


(hopefully) Get you interested in AI

Clarify some notions around AI Rethink about numbers

Have an awesome face detector app on your

iphone made by yourself


3


“a boring lecture on Thursday afternoon and everyone is late”


4


AI is fundamentally interesting…

Why?


5


How I get interested in AI

Cozy environment

Curious about ourselves (mind-blowing moments)

It’s NOT hard (easy to follow because it closely connects to our daily experience)

6


what is this sound?


https://www.youtube.com/watch?v=bbLDfueL7eU


7


bringing the vid…


how do you feel?


8


Once upon a time…


9


10


Pattern recognition: find regularity, enable

prediction making


11


travel back to our era…


what do we have now?


12

The significance of pattern recognition to humanity

We distinguished predator from prey; and poisonous plants from nourishing ones - enhancing our chance to live and reproduce, and passing on our genes. We used pattern recognition

in astronomy and astrology, where different cultures, recognizing

the patterns of stars in the skies, projected different symbols and pictures for constellations. We used it to predict the passing of the seasons, including how every culture determined that the passage of a comet was taken as an omen.”

— “When Knowledge Conquered Fear”, third episode of the documentary tv series Cosmos: A Spacetime Odyssey

13


Pattern recognition as an essential part of our

experience:

Guess the bonfire sound, Get on the right tube line(daily task solving), read my handwriting(language), appreciate music(art), etc.

Name more…


14


Artificial Intelligence

what is it??

“intelligence, made by human

” it is still an unfinished goal

we usually hate artificial something


15


intelligence{


16


Intelligence is a big bag word

It includes the ability to solve complex problems or make decisions with outcomes benefiting the actor

and many more…


17




Is intelligence exclusive to human?


18


We have:

Gorilla uses a stick to test depth of water

https://en.wikipedia.org/wiki/Tool_use_by_animals#:~:text=Chimpanzees%20are%20sophisticated%20tool%20users,in%20the%20Republic%20of%20Congo.


Fish makes art:

https://www.youtube.com/watch?v=VQr8xDk_UaY

Dog talks:

https://www.youtube.com/watch?v=QKQK7EIcq9Y


19

What is intelligence?

Intelligence is always specific, most of time we are interested in human intelligence (for simplicity, we omit human in the rest of slides)

What are we able to do with intelligence? Human task solving, art making , etc. what else?


20




If we manage to build the machine to be able to

solve tasks and make art, can we say we have achieve AI?


Intelligence = task solving + art making?



emotion


21


Thinking about intelligence is a way of appreciation

Simple things are hard


“Be the subject of your own thoughts”


22


}


23


artificial intelligence{


24


Here are some human-made concepts of subjects:

Engineering science:

make tools

— life gets easier


Natural science:

discover and explain phenomena

— curiosity satisfied


25


These two are actually intertwined

So does AI!


26


AI is a tool (like engineering science)


27


AI is also our attempt to understand

intelligence better (natural science)


“ We don’t think we have understood something unless we can build it fRoM sCrACh.”


28

From scratch: use machine which is created by

human, not other organic beings we don’t fully understand yet

Then…

Where shall I start with if I want to make machines to be intelligent, aka to be able to solve tasks and make art?


29


Recall:

Pattern recognition as an essential part of human experience:

Get on the right tube line(daily task solving), read my handwriting(language), appreciate music(art), etc.


30


Let’s make the machine to do pattern

recognition!


31

Let’s make the machine to do pattern recognition!


Do you know how cool it is?


Few hours of computations by our little metal box

~=

Hundreds of thousands of years of human evolution and experience accumulation


Shout out to Patrick Winston https://youtu.be/Unzc731iCUY?t=2323


32


Here are what we’ve talked so far:


34


}


35


Insert self introduction here :)


Research https://anonymous84654.github.io/RAVE_anonymous/

Drum and AI https://vimeo.com/93213203


36


Story time

What are AI researchers like? “sleeping”

Geoffry Hinton, the godfather of DL, is recently taking inspiration from what the purpose of sleeping is and why we have dreams…

https://www.youtube.com/ watch?v=2EDP4v-9TUA


37


“a lucid dream”


38


Studying AI is a thought-provoking process


And it will get us to know ourselves better (lots of fun facts to come…)


Enjoy !

39


Noodling time..


40



Intelligence is not exclusive to human

Other species can also make and use tool, solve tasks and create art…

And now that machine can do something too What make us us then?

41



“humanity ~= human capability - artificial intelligence"


42



Is it true that we assume machine intelligence is always a subset

of human intelligence?

Is it possible if machine can actually do things that we intrinsically can not do? Like some of machine intelligence capabilities are beyond that of human intelligence ?


43


Representation{


44


What is representation?


“descriptor” “features” “characteristics”


45

Why do we need to have the notion of representation:


  1. It is inevitable as a result of our “flaw”, more on this later


  2. It is also a powerful tool towards task solving


46


What is apple?


47


Pattern recognition question 1

What is apple as a fruit vs. Apple as tech company?


48


(Efficient) representations:


49


Pattern recognition question 2

What is apple vs. pear?


50


(Efficient) representations:


51


Good representation simplifies our task

To excel at pattern recognition ~= To find a good representation

ANIME time ! DOMAIN EXPANSION

https://www.youtube.com/watch?v=nmvkhLz8t7I


52


(Efficient) representations:

Meet “papple ” …

https://www.theguardian.com/lifeandstyle/wordofmouth/

2012/may/21/the-papple-tasted-and-tested


53


To get out of ambiguity, just ask about the

context


Why do you want to know if it is an apple or pear?


54


Representation is contextual

Depends on the problem given, different tasks have different efficient representations


55


Perhaps we can never describe/represent

one thing as it is with nothing less nothing more…

our natural language doom to fail (our “flaw”)


56


Representation


57


Another related notion

Abstraction:

Taking away irrelevant details, reducing the representation to essential characteristics


58


Joel’s slides on “what is computational

thinking” https://jgl.github.io/DiplomaInAppleDevelopment-AutumnWinter2022/codingOne/lecture_01.html#42


Lots of things are connected. Studying AI is a brilliant manifestation of computational thinking.

59


Numbers{


60


What is the domain where we can have almost perfect

representation (aka without ambiguity)?


61


I have three pens

what is “I” what is “have” what is “pen”

what is “three”


62


We always need a “protocol”(like an agreement on how to interpret numbers, or like a dictionary for looking up number’s meaning) when using numbers in real life.


65


Though numbers provide an almost perfect representation domain, it is “fictional”


In real life, we don’t see numbers on their own walking on the street


66


When we encounter numbers in real world, there are always real-world meanings attached to numbers


We always need an interpretation guide(“protocol”) when using numbers in real life.


67

Why do I want to talk about numbers?


Our human-made poor machine can only deal with numbers

Numbers can introduce maths, which is our DOMAIN EXPANSION

It is SUPER important to grasp the idea of using (numbers

+ protocol) to represent things, for doing fancy AI stuff


68


}

End of noodling,

Starting ordinary lecture mode…


69


70


scope of this module

how you can internalise the knowledge efficiently


78

how do we learn? - questions


“attention mechanism”


80


81


machine learning model


82


before diving into machine learning model…


83


“information era”


84

information we receive from the world are mainly from four categories:


can you think of any information that is not from the four categories? there are…

85

Terminology used by AI nerds data category = data modality


86


my mind-blowing moment:

information from any of these three categories (image, text and sound) can be represented by just a bunch of numbers

using numbers only


87


image in numbers:

two numbers for its width and height (how many pixels)

for each pixel, what the rgb values are


88


language in numbers:

we will talk about this later

but for now just think about when you looking up a word in a dictionary

using page number and index

(also math itself is a language….)


89


sound in numbers:



this is a wav file of a drum beat, screenshot with a lot of zooming in

each dot represents a number


90


why do we care about represent things in numbers?


91


machine learning model

? ?


92


what is a model


93

they are all nice tools but every one sounds very different from each other!

94



97


“what are the input and output of this model” is always the first

question to ask..


It also helps answering this question: what does this ML model do


98


try this…

what does a speech recognition model

do?

try answer using “given a <your educated guess on the input>, the speech recognition model generates <your educated guess on the output>”

image

, text
, audio
, numbers


99


and try this…

what does a dog-or-cat image classification model

do?

try answer using “given a <your guess on the input>, the dog-or-cat image classification model generates <your guess on the output>”

image

, text
, sound
, numbers


100


How to shepherd the meaning of a bunch of numbers in the output?


How do we know what each output number represent? During training (next unit),

specify which output number means what (the “protocol”), this protocol will stay consistent across the life span,

the protocol of how to interpret each output number should be passed to model users


101

An example

Task context:

Use numbers to represent whether the image is a dog or cat The number representation I come up with:

[0, 1]

The protocol I’m going to pass around:

hello this is a protocol created by covfefe for the dog-or-cat numeric representation and I can bullshit whatever I want here as long as I explain how to interpret [0, 1] somewhere are you with me

The first number (with index 0) in this array represents the probability of this image being a dog image

The second number (with index 1) in the array represents the probability of this image being a cat image

102


now try this


103


output

input



“a boring lecture where

everyone is sleeping”


104

think of ML models as tools


105

coincidentally, apple ML framework has a similar division too


how to make tools - CreateML how to use tools - CoreML


106

also coincidentally, the “input and output” thinking of an ML model manifests in how apple defines a ML model in its CoreML framework:



107


can you find this “input, process and output” mechanism in us as human beings?


https://www.youtube.com/watch?v=X5fD0Evny4w&t=36s


108


end of machine learning model introduction

question?


109


face detection model


110


while your memory is fresh..

what does a face detection model do?

try answer using “given a <your guess on the input>, the face detection model generates <your guess on the output>”

image

, text
, sound
, numbers


111


face detection model


given an image (with or without faces, could be any),


112

the face detection model generates


the detected locations of faces


what can we use the model output

(detected face locations) for?


113


what can we use the model output (detected face locations) for?


counting how many faces are there

draw the detected face location bounding box on the image

applying an emoji to cover the face


we will be building an app to achieve all of these in a minute!!!


114


go to app construction now…

or if time allows we can dive a bit deeper into the face detection model introduction


115


a face detection model does not generate output in the form of this nice green rectangular bounding box as you see


its output is actually a numeric representation of this green box (recall we can represent all those amazing stuff in numbers?)


based on the number representation of the bounding box, we programme the computer to help us draw out this box


116


how is the detected face location, aka bounding box, represented in number?


117


an easier question

how is the location of a single point in an image represented in numbers?



119


now that we know how a single point is represented in number



a bounding box is nothing but a combination of its four corner points


once the four corners is known, we just sit and let the computer to draw the lines for us

120

Example on how to represent one bounding box in numbers:

Location of upper-left corner: [0, 0]

Location of upper-right corner: [20, 0]

Location of upper-left corner: [0, 40]

Location of upper-left corner: [20, 40]

One bounding box: [[0, 0], [20, 0], [0, 40], [20, 40]]

Don’t forget the protocol:

<insert your educated guess here>

121


Noodling time:

Given the representation of one bounding box: [[0, 0], [20, 0], [0, 40], [20, 40]]

(with <protocol same as in last slide>)

Can we infer the width and height of this box?


122


Noodling time:

do we really need all four corners’ (x, y) coordinates to be able to draw the bounding box?


123

some face detection model can do more than just figuring

out where the outline of face is…


recall when you see someone’s face image from a book and you tend to look into their eyes for one second?


124

some face detection models can do something similar…


they can find the exact locations of eyes and nose-tips, and many more…

these points are called “landmarks”


check what landmarks apple’s model can find https:// developer.apple.com/documentation/vision/vnfacelandmarks2d


125

here is a face detection landmarks output of manga images [2]



126

each landmark is represented by its coordinates

a set of landmarks means a set of coordinates



127

Example on how to numerically represent landmarks numbers:

Location of right eye mid point : [30, 20] Location of left eye mid point: [50, 20] Location of nose tip point : [40, 40] Location of upper-left corner: [20, 40] and other points of interest…

Landmarks: [[30, 20], [50, 20], [40, 40] … etc. ] Don’t forget the protocol:

Arrays are in the order of right eye mid point, left eye mid point, nose tip point, etc.

128

what do we need the landmarks for?


the bounding box can only tell if some region has a face, regardless of its rotation


landmarks can tell us the rotation of the face

we need this information to perfectly overlay emoji


129


finding faces may seem trivial for our visual system, it used to be a hard task for machines

we can locate face and landmarks in one go within

a blink, for machines finding landmarks is another level of difficulty to achieve

“simple things are hard”


130


thankfully when coding an iOS app, we just need to type in the right function that’s all


detecting bounding boxes: VNDetectFaceRectanglesRequest()


detecting landmarks: VNDetectFaceLandmarksRequest()


131


By calling VNDetectFaceRectanglesRequest() or VNDetectFaceLandmarksRequest()

We are retrieving the output of apple’s awesome face detection model

Question for later:

Where do we feed input to the model?


132


end of face detection model introduction

question?


133


construction time

!!!

https://github.com/XiaowanYi/MLOne-DiplomaInAppleDevAW22-Lec-01


134


preparation 1: which Xcode version are you using?


135


preparation 2:

there will be mostly cutting and pasting from the textbook

don’t be scared — you don’t have to comprehend every single line


136


preparation 3:

to get the cutting and pasting right, pay attention to which function or which object you are pasting into

“the scope”


137


let’s start the project by open Xcode:

Create a new Xcode project iOS -> App -> Next


138


step a:

select SwiftUI from interface dropdown menu

Use core data and include tests unchecked (not important for this project)

Click next and select a folder:

Good practice: creating folder in a designated working folder


139


step b:

magic dust 1:


Info -> custom iOS target properties

-> “+” on any row -> select “privacy

- camera usage description”


140


step c:

let’s look at textbook P171- 173 step 3 & 4

don’t worry about errors notifications, they will be resolved as we progress

have you seen the familiar VNDetectFaceRectanglesRequest() ?

141


step d:

let’s move to textbook P174 step 1, 2 & 3 note we are moving to a new file (Views.swift) this is defining ui buttons we will use later


142


step e:

let’s look at textbook P176-177 step 4, 5 &6 correction 1:

in step 4 first line struct Main View

It should be struct MainView (remove the space in-between) correction 2:

in step 4 second line private let image: Ullmage

It should be private let image: UIImage (both are capital I not lowercase)

143


step f:

let’s look at textbook P177-181 step 7 this is a new and long struct, be careful


144


step g:

let’s look at textbook P181-182 step 8 this is for handling rotations

recall: when do we need rotations?


145


stop h:

moving to filo ContontViow.swift filo


146


stop i:

lot’s look at toxtbook P182-184 stop 9, 10, 11 ui stuff

In stop 11 tho lino with:

.navigationBarTitlo(Toxt(“FDDomo"),

you can chango tho toxt string to bo your own app namo

147


stop j:

lot’s look at toxtbook P184-185 stop 12, 13


148


stop k:

lot’s look at toxtbook P185-187 stop 14, 15, 16

noto from stop 14 wo go out of tho scopo of oxtonsion ContontViow{} and pasting codos diroctly


149


stop L:

lot’s look at toxtbook P187-188 stop 17, 18, 19

noto from stop 14 wo go out of tho scopo of oxtonsion ContontViow{} and pasting codos diroctly


150


stop m:

patch 1:

continuo on stop 19 , add tho following function into struct ContontViow: Viow {}


privato func controlRoturnod(imago: UIImago?) {

print("Imago roturn \(imago == nil ? "failuro" : "succoss")...") solf.imago = imago?.fixOriontation()

solf.facos = nil


}


151


stop n:

magic dust 2:

add placoholdor and icon to your Assots (drag and drop to Assots in navigator)


152


building timo !!!


153


noxt: draw tho bounding box


154


stop o:

lot’s look at toxtbook P190-192 stop 1, 2, 3 wo aro moving to filo Facos.swift

horo tho codos corrosponding to draw out tho bounding box

you can customiso tho box colour in stop 3 contoxt.sotStrokoColor(UIColor.rod.cgColor)


155


stop p:

lot’s look at toxtbook P192 stop 5

wo aro updating tho ontiro gotFacos() to incorporato tho box drawing function (drawOn() )


156


building timo !!!


157


noxt: applying omoji on top

wo’ll only bo working on tho filo Facos.swift


158


stop q:

lot’s look at toxtbook P197-199 stop 1 for imago rotation


159


stop r:

lot’s look at toxtbook P199-203 stop2 rocall: landmarks

VNFacoLandmarks2D horo roprosonts all of tho landmarks that Applo’s Vision framowork can dotoct in a faco.


160


stop s:

lot’s look at toxtbook P203-207 stop3, 4, 5, 6 adding oxtonsions on tho global scopo


161


stop t:

socond last stop!

lot’s look at toxtbook P207-210 stop7

it is roplacing tho ontiro oxtonsion on Colloction

basically it roplacos tho box drawing function with tho omoji placing function


162


if no omoji is shown your oditor, you nood to copy

pasto tho list of omojis from lino149 - 157 horo

https://github.com/AIwithSwift/ PracticalAIwithSwift1stEd-Codo/blob/mastor/ Chaptor%204%20-%20Vision/Faco%20Dotoction/ FDDomo-Improvod/FDDomo/Facos.swift


163


stop u:

finally!!!

in Facos.swift -> oxtonsion UIImago {} -> roughly 4th lino chango lot roquost = VNDotoctFacoRoctanglosRoquost() to

lot roquost = VNDotoctFacoLandmarksRoquost()

(in ordor to switch from bounding box dotoction to landmarks dotoction)


164


building timo !!!


165


congrats

don’t bo scarod about tho codo,

this locturo is about undorstanding tho practical sido of ML

as long as you got tho idoa of “using ML modol output by calling tho right function“


166

a gontlo summary


168


Roforoncos

rof 1: https://www.scioncodiroct.com/scionco/articlo/abs/pii/S0022537179902007

rof 2: https://www.somanticscholar.org/papor/Facial-Landmark-Dotoction-for-Manga-

Imagos-Strickor-Augoroau/64cac22210861d4o9afb00b781da90cf99f9d19c imago rof https://animovyuh.org/faco-dotoction-using-oponcv/

imago rof https://support.wolfram.com/25330?src=mathomatica


169